DAOS-16964 ddb: read-only mode requires enabling Copy-on-Write for...#17533
DAOS-16964 ddb: read-only mode requires enabling Copy-on-Write for...#17533
Conversation
…EMOBJ pools PMEMOBJ maintains its own metadata. Copy-on-Write prevents these changes from taking effect so read-only mode will be truly read-only. Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
|
Ticket title is 'Opening a VOS file in read only mode with DDB modifies the VOS file' |
Doc-only: true Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
…read-only-cow Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Doc-only: true Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
| * However, since none of these changes need to be persisted when the pool is opened in | ||
| * read‑only mode (write_mode == false), we can work around this by mapping the pool using | ||
| * copy‑on‑write. Copy‑on‑write allows pages to be read normally, but when a page is | ||
| * modified, a new private copy is allocated. As a result, any changes made to |
There was a problem hiding this comment.
If new private copy is allocated, then when open the vos pool again, the operation (read/write) will work against the original one or the new copy?
On the other hand, under non-interactive mode, many ddb sub-commands will open the vos pool implicitly as read-only, such as ddb ls. If it is repeated for a lot of times (for different items), will that cause some trouble for copy-on-write? Under extreme case, will ddb ls may fail because of ENOSPC?
There was a problem hiding this comment.
If new private copy is allocated, then when open the vos pool again, the operation (read/write) will work against the original one or the new copy?
This is a volatile copy of modified pages in RAM. These pages are freed once you close the pool / call munmap().
Please see mmap() and MAP_PRIVATE for details.
On the other hand, under non-interactive mode, many ddb sub-commands will open the vos pool implicitly as read-only, such as ddb ls. If it is repeated for a lot of times (for different items), will that cause some trouble for copy-on-write? Under extreme case, will ddb ls may fail because of ENOSPC?
If it is truly read-only I assume no changes will be made no matter the number of items being processed. So, I expect after startup, when a few copied pages will be allocated (for various PMEMOBJ purposes), the memory consumption in this department ought to be flat.
If, despite of the assumption of not making any changes to the pool, VOS actually makes writes to the pool when the user (DDB) requests merely to read the data, the memory consumption will continue to grow. The final signal would be ENOMEM.
If the latter is the case I would argue we should rethink are even able to provide a read-only mode.
There was a problem hiding this comment.
I am not sure for this. It is better to make some test with you patch, such as repeatedly ddb ls.
There was a problem hiding this comment.
If the latter is the case I would argue we should rethink are even able to provide a read-only mode.
I do not know how the implicit modification will be triggered when open the vos-pool and what is the potential side-effect. According to your description in DAOS-16964, both the vos file time stamp and some other metadata are changed.
Will that potentially damage the vos-pool? If it is harmful, does it means even if we did not run ddb, related risk still be there when DAOS engine normally opens the vos-pool? Otherwise, if it is harmless, can we ignore such implicit modification to avoid CoW trouble?
There was a problem hiding this comment.
Will that potentially damage the vos-pool?
Yes. It can. You can imagine e.g. the PMEMOBJ ulog is corrupted. If we have bad luck PMEMOBJ may apply these malicious changes and further mangle the VOS pool contents. Which may prevents us from finding the root cause of whatever problem we are trying to pinpoint.
does it means even if we did not run ddb, related risk still be there when DAOS engine normally opens the vos-pool?
Yes. At the moment opening the VOS pool always does the same things underneath. No matter whether the open is supervised by DDB or daos_engine.
can we ignore such implicit modification to avoid CoW trouble?
It looks like it is not harmless in general. But regardless I think CoW is worth the hassle because we need a read-only mode in my opinion. Otherwise, DDB is not really a debug tool since we cannot control what it does to the VOS file.
- I added a test which shows that CoW prevents changes from reaching the backing-file and in result also mtime stays the same. 😁
- I have done some measurements to check the memory consumption in both cases. TBH the results are odd. I wrote my observations in the ticket: https://daosio.atlassian.net/browse/DAOS-16964?focusedCommentId=153817 Regardless I think we should proceed with this patch and maybe create a new ticket to understand better why it is the case. Which would be beneficial in three ways:
- we could potentially decrease the memory usage assuming some of these operations are not necessary for DDB to function.
- if these operations are not optimal maybe there is a room for performance gains for both DDB and
daos_engine - it is not good both of us are surprised by this result (I assume you did not know it is the case, did you? 😉 ) so whatever we find out it would improve our understanding of how VOS really works
|
Master PRs currently require merge approval. I added |
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Ref: #7153 Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
src/vos/vos_pool.c
Outdated
There was a problem hiding this comment.
Shall we also remove this munlock() function call?
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>


... PMEMOBJ pools
PMEMOBJ maintains its own metadata. Copy-on-Write prevents these changes from taking effect so read-only mode will be truly read-only.
Also removing the
mlock()workaround because:mlock()leads to increased memory usage, since the entire pool is pulled into RAM when it is opened. Where themlock()serves no role whatsoever.mlock()has been unnecessary for quite some time. It was originally added to work around a cryptic issue observed when using libfabric with the verbs provider and performing direct RDMA writes into pool memory. Direct RDMA writes to pool memory are no longer used, so the workaround is obsolete. For details please see the ticket to get the complete paper trail.Steps for the author:
After all prior steps are complete: